Trust-Aware Federated Learning for Network Intrusion Detection

Abstract

This work presents a Trust-Aware Federated Intrusion Detection System (IDS) trained on the CICIDS-2017 benchmark — a dataset of ~2.5 million labelled network flow records spanning 15 traffic classes. The system combines a dual-head autoencoder (simultaneous reconstruction and binary classification) with a Federated Learning (FL) framework in which each participating client retains its raw traffic data locally; only model weight updates are communicated to the central aggregation server. A lightweight A/R/C trust mechanism evaluates every client's accuracy, reconstruction quality, and weight divergence before each FedAvg round, filtering out unreliable or actively malicious participants. The model is additionally evaluated on a rigorous zero-day holdout protocol — four rare attack classes withheld entirely from training — and against three poisoning attack schemes (label flip, feature noise, backdoor) to demonstrate defence robustness. Fusion of the classifier score with the per-sample reconstruction error consistently outperforms the classifier alone on unseen threat detection.

1. Introduction

Modern network infrastructure is exposed to an ever-growing catalogue of cyber threats — from volumetric denial-of-service floods to stealthy low-and-slow infiltration campaigns. Traditional IDS approaches centralise raw traffic logs on a single training server, which creates significant privacy and regulatory risks for organisations operating across jurisdictions or sharing sensitive operational data.

Federated Learning (FL) offers a compelling alternative: clients train local model replicas on their own private data shards, then contribute only weight updates to a global model via a central aggregation server. No raw packets or flow records ever leave the originating node. This privacy guarantee is attractive for industrial control systems, healthcare networks, and inter-organisational threat intelligence sharing — domains where data sovereignty is non-negotiable.

However, FL introduces its own unique threat surface. A malicious client can submit crafted weight updates designed to degrade the global model — a class of attacks known as poisoning attacks. Simultaneously, the distributed training paradigm makes it harder to detect novel attack patterns: a single client shard may contain too few samples of a rare attack type to generalise. This work addresses both challenges through a unified architecture.

Core Research Questions: Can a federated IDS match centralised performance on seen attacks? Does the reconstruction head of a dual autoencoder provide a reliable signal for zero-day threats? Does the A/R/C trust gate mitigate poisoning without knowing which clients are malicious?

2. About the Dataset — CICIDS-2017

The Canadian Institute for Cybersecurity Intrusion Detection System 2017 (CICIDS-2017) dataset is one of the most widely used benchmarks in network security research. It was captured over five working days in a realistic enterprise topology with controlled attack injection, producing labelled flow-level records for 15 distinct traffic classes.

Property	Value
Total samples	~2.5 million (after deduplication)
Features	63 numeric flow statistics (after dropping 8 constant columns)
Traffic classes	15 (1 benign + 14 attack types)
Capture period	Monday–Friday, July 3–7 2017
Source files	8 daily CSV files, ~100–400 MB each
Majority class	BENIGN (~83% of all records)
Rarest class	Heartbleed (11 samples total)

Class Distribution

Figure 1: Class distribution in CICIDS-2017 — severe imbalance between BENIGN and rare attack types

Zero-Day Holdout Classes

Four rare, high-impact classes are designated as zero-day and are completely excluded from all training and validation sets. They are evaluated only at test time to measure the model's ability to generalise to previously unseen attack types:

Zero-Day Class	Total Samples	Threat Category
`Heartbleed`	11	Memory disclosure / TLS vulnerability
`Infiltration`	36	Multi-stage lateral movement
`Web_Attack_SQL_Injection`	21	Web application exploit
`Web_Attack_XSS`	652	Cross-site scripting

3. Research Methodology

The methodology is structured in five stages: data loading and cleaning, feature engineering and class balancing, zero-day split and artifact persistence, federated client partitioning, and trust-aware federated training with dual evaluation (seen attacks and zero-day).

3.1 Hardened Data Loading & Preprocessing

The CICIDS-2017 dataset is distributed across 8 separate CSV files — one per capture day. A robust loading pipeline was implemented to handle the encoding inconsistencies common in this dataset:

Deterministic file enumeration

Files are sorted before loading to guarantee reproducibility regardless of filesystem ordering.

Column name sanitisation

UTF-8 byte sanitisation strips BOM characters; strip() removes leading/trailing whitespace. The common pitfall of ' Label' ≠ 'Label' is handled automatically.

Label harmonisation

Unicode dashes in "Web Attack" class names differ between files. All variants are normalised to underscore-separated ASCII strings (e.g., Web_Attack_BruteForce).

Constant column removal

Eight zero-variance features (e.g., Bwd_PSH_Flags, Fwd_Avg_BytesBulk) are detected and dropped automatically, reducing noise for downstream models.

Inf / NaN imputation

Division-by-zero packet rates produce ±inf; these are replaced with NaN and imputed using per-column median values (robust to skewed distributions).

Label encoding

Two parallel targets are created: Label_Binary (0 = BENIGN, 1 = any attack) and Label_MultiClass (integer codes 0–14 via LabelEncoder).

3.2 Class Imbalance — Targeted SMOTE

The BENIGN class accounts for ~83% of samples. A model trained naively would learn to predict BENIGN for everything and still achieve high accuracy. SMOTE (Synthetic Minority Over-sampling Technique) generates new synthetic samples by interpolating between a real minority-class sample and its k nearest neighbours.

SMOTE is applied only to the three rarest classes and only within the training fold to prevent data leakage:

Class	Before SMOTE	After SMOTE	k_neighbors used
`Heartbleed`	~8	5,000	2
`Infiltration`	~25	5,000	2
`Web_Attack_SQL_Injection`	~15	5,000	2

Implementation note: SMOTE is applied only to the isolated rare-class rows, then the oversampled subset is recombined with the non-rare training rows. This avoids feeding SMOTE the large BENIGN class, which would be computationally wasteful and memory-intensive.

3.3 Zero-Day Holdout Protocol & Data Persistence

Before any splitting, the four zero-day classes are extracted into a completely separate slice (X_zd, y_zd). The remaining data forms the training pool from which stratified train / val / test splits are created:

StandardScaler is fit exclusively on X_train; transform-only is applied to all other splits. All processed arrays are persisted to Parquet files alongside JSON metadata (feature list, class map) and NumPy scaler parameters (.npy) for fast, reproducible reloads across sessions.

3.4 Non-IID Client Partitioning (Dirichlet Split)

In a real federation, data is not uniformly distributed across clients — a university campus network sees different traffic than a hospital or financial institution. To simulate this realistic heterogeneity, a Dirichlet(α) distribution is used to partition training data across K = 5 clients:

For each class c, a proportion vector is drawn from Dirichlet(α · 1_K) and used to allocate that class's samples among clients:

α value	Data heterogeneity	Description
0.1	Highly non-IID	Each client sees mostly one traffic class
0.5	Moderate non-IID	Default — typical research setting
10.0	Near-IID	Each client holds a representative sample

3.5 Model Architecture — Dual-Head Autoencoder

The core model is a dual-head autoencoder built in TensorFlow 2.15 / Keras. A shared encoder maps the 63-dimensional input flow to a 16-dimensional bottleneck representation. Two separate heads branch from this bottleneck:

Figure 2: Dual-head autoencoder — shared encoder bottleneck feeds both a reconstruction decoder and a binary classifier

The model is trained with a weighted combined loss:

ℒ = λ · MSE(X, X̂) + (1 − λ) · BCE(y, ŷ)

λ = 0.5 (default) | Increasing λ → higher zero-day recall | Decreasing λ → higher supervised F1

Reconstruction head (MSE): Forces the bottleneck to capture the structure of normal traffic. Zero-day attacks — never seen in training — produce high reconstruction error at inference time, providing an unsupervised anomaly signal.
Classification head (BCE): Directly supervises the attack-vs-benign decision on the seen attack classes, learning the discriminative features that separate malicious from benign flows.

4. A/R/C Trust Mechanism

After each local training round, the central server evaluates every client's updated model on the shared validation set and computes three trust signals. These are normalised and combined into a scalar trust score T; clients below the acceptance threshold τ are excluded from that round's FedAvg aggregation.

Accuracy

Validation accuracy of the client's locally updated model. A model corrupted by poisoning will generalise poorly.

↑ Higher = more trusted

Reconstruction

Validation reconstruction MSE. A poisoned model that distorts the latent space will show elevated reconstruction error.

↓ Lower = more trusted

Cosine Distance

L₂ norm of the difference between the client's weight vector and the current global weights. Large deviations signal abnormal updates.

↓ Lower = more trusted

T = α · A_norm + β · R_norm + γ · C_norm

Default weights: α = 0.4 | β = 0.3 | γ = 0.3 | Acceptance threshold τ = 0.60

Each round, clients are ranked by T in descending order. The top TOPK = ⌈0.6 × K⌉ clients with T ≥ τ are accepted for FedAvg aggregation. If no client meets the threshold, the top-TOPK by score are accepted as a fallback to prevent training stagnation.

Figure 3: Per-client trust scores T across 8 federated learning rounds

Figure 4: Round × Client acceptance heatmap (1 = accepted, 0 = rejected)

Figure 5: Stacked bar — mean A/R/C contribution to trust score per client

5. Federated Training Loop

Training runs for 8 communication rounds. Each round follows the sequence: broadcast → local training → trust evaluation → filtered FedAvg. The entire loop is logged to CSV files for reproducibility.

Hyperparameter	Value
FL algorithm	FedAvg (weighted by client dataset size)
Clients (K)	5
Communication rounds	8
Local epochs per round	1
Batch size	1,024
Learning rate	1 × 10⁻³ (Adam)
Loss weight λ	0.5 (reconstruction : classification)
Class weight clip	10× (prevents extreme sample weights)
Dirichlet α	0.5 (moderate non-IID)

Figure 6: AUROC and F1 trajectories for validation, test, and zero-day sets across 8 FL rounds

6. Results & Evaluation

6.1 Seen-Attack Performance

The global model is evaluated on the held-out test set containing only the 10 seen attack classes plus BENIGN after the final FL round:

≥0.99

Test AUROC

≥0.97

Test F1

≥0.98

Val AUROC

<0.05

Recon MSE

0.60

Trust τ

FL Rounds

Note: Exact metric values will vary between runs due to stochastic FL training and Dirichlet partitioning. The figures above represent typical performance; run the notebook to obtain precise values for your environment.

6.2 Zero-Day Detection

At inference time, the model can detect zero-day attacks through two complementary strategies. The evaluation is performed on a balanced mixed zero-day set composed of all held-out zero-day samples plus an equal number of benign samples drawn from the test set:

Strategy	Score used	Threshold	Advantage
Classifier-only	P(attack) from sigmoid head	τ* tuned on PR curve	Simple deployment — single forward pass
Fusion score	λ · P(attack) + (1−λ) · MSE_norm	Grid search over λ and τ	Consistently higher zero-day recall

Figure 7: PR curve on zero-day mixed set with optimal τ* marked

Figure 8: ROC curve on zero-day mixed set

Figure 9: Zero-day threshold sweep — F1 is maximised at τ* (optimal operating point)

Figure 10: Grid search over λ — optimal blend of classifier score and reconstruction MSE

6.3 Score & Reconstruction Distributions

Figure 11: Classifier score and reconstruction MSE distributions for benign vs zero-day traffic

6.4 Confusion Matrices (Zero-Day)

In a security context, False Negatives are the most costly error — a missed attack may result in a breach. We therefore tune the decision threshold to maximise F1 rather than accuracy:

Figure 12a: Confusion matrix — classifier-only strategy at optimal threshold τ*

Figure 12b: Confusion matrix — fusion strategy at optimal λ and τ (typically higher recall)

6.5 Bottleneck Embeddings (t-SNE)

The 16-dimensional bottleneck representation z is projected to 2D using t-SNE to visualise how well the encoder separates benign from attack traffic in the latent space:

Figure 13a: t-SNE of bottleneck embeddings — test set (benign vs seen attacks)

Figure 13b: t-SNE of bottleneck embeddings — zero-day mixed set (benign vs unseen attacks)

7. Robustness Against Poisoning Attacks

In a federated setting, a malicious participant can submit crafted weight updates designed to degrade the global model. This section implements and evaluates three poisoning strategies applied to two of the five clients:

Attack	Mechanism	Parameters
Label Flip	Randomly flip 35% of training labels (0→1 or 1→0) — corrupts the supervised signal	`flip_frac = 0.35`
Feature Noise	Add Gaussian noise (μ=0, σ=3.5) to 35% of feature vectors — corrupts input distribution	`noise_frac = 0.35, noise_std = 3.5`
Backdoor	Stamp a fixed trigger pattern on 6% of samples and relabel them as BENIGN — embeds a hidden activation	`trigger_frac = 0.06, n_feats = 4, trigger_val = 15.0`

7.1 Experimental Design

Three experiments isolate the effect of the trust mechanism:

EXP 1

Clean + Trust ON

No poisoning applied. Trust gate active. Establishes the ceiling performance — the best the system can achieve under ideal conditions.

EXP 2

Poisoned + Trust OFF

Two clients poisoned. Trust gate disabled — all clients accepted every round. Measures the attack damage to the global model.

EXP 3

Poisoned + Trust ON

Two clients poisoned. Trust gate active — A/R/C filters low-trust clients. Measures the defence recovery relative to EXP 1.

Expected outcome: A poisoned client will exhibit low validation accuracy (↓ A), high reconstruction error (↓ R), and large weight deviation (↓ C) — all three signals push T below the acceptance threshold τ, causing automatic exclusion without any knowledge of which clients are malicious.

7.2 Poisoning Experiment Results

Figure 14: Poisoning experiment comparison — Trust ON (EXP 3) recovers performance lost under poisoning (EXP 2)

Experiment	Test AUROC	Test F1	Zero-Day AUROC	Zero-Day F1 (best τ)
✅ Clean + Trust ON	baseline	baseline	baseline	baseline
☠️ Poisoned + Trust OFF	↓ degraded	↓ degraded	↓ degraded	↓ degraded
🛡️ Poisoned + Trust ON	≈ baseline	≈ baseline	≈ baseline	≈ baseline

8. Conclusion

This work demonstrates an end-to-end Trust-Aware Federated Intrusion Detection System that achieves strong performance on both seen and unseen attack types while preserving the privacy of each participant's raw network data. The key contributions are:

A dual-head autoencoder that simultaneously learns supervised binary classification and unsupervised reconstruction, enabling anomaly detection for zero-day threats through elevated reconstruction error — no labelled examples of the new attack are required.
A Federated Learning framework with Dirichlet non-IID client partitioning, per-client balanced class weighting, and GPU-optimised tf.data pipelines.
An A/R/C trust gate that evaluates each client across three complementary dimensions before every aggregation round, filtering malicious participants without any prior knowledge of who is compromised.
A rigorous zero-day evaluation protocol and a fusion scoring strategy (classifier probability + normalised reconstruction MSE) that consistently outperforms the classifier head alone on unseen attack types.
Demonstrated robustness to three poisoning attack schemes (label flip, feature noise, backdoor) — the trust mechanism substantially recovers performance compared to the undefended baseline.

Future Directions

Differential privacy integration (DP-SGD) to provide formal privacy guarantees in addition to the data-locality guarantee of FL.
Adaptive λ scheduling — anneal from high reconstruction weight (strong zero-day sensitivity) toward lower weights (improved supervised F1) as training matures.
Cross-dataset evaluation on CIC-IDS-2018 and UNSW-NB15 to assess generalisability beyond the CICIDS-2017 benchmark.
Asynchronous FL to handle client drop-out and stragglers in realistic deployment scenarios.
Explainability — SHAP or attention-based attribution of bottleneck activations to identify the most discriminative flow features per attack type.

View on GitHub Dataset on Kaggle